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oped system of schema!! "^^^''^ ^ »eed a devel- 

E. H. Gombrich, Art and Illusion, 1959 (p. 76) 
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Information exploration should be a joyous experience, but many 
tators talk of information overload and anxiety (Wurman, 1989). 
there is promising evidence that the next generation of digital liJ^^^ 
enable convenient exploration of growing information spaces bj' 
range of users. User-interface designers are inventing more pow^^ 
and visualization methods, while offering smoother integration o1| 

ogy with task. Ai fl 

The terminology swirl in this domain is especially colortui..^| 
terms of information retrieval (often applied to bibliographic ^nd'!^ 
ument systems) and database management (often applied to more 
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relataonal database systems with orderly attributes and sort keys), are being 
pushed aside by newer notions of information gathering, seeking, filtering or 
^««/zzaftan Business-oriented developers focus on L huge volumes of 
data when they talk of data mining and warehousing, whUe expert-system 
visionaries talk about knowledge networks. The distinctions are subtle- the 
common goals range from finding a narrow set of items in a laige coUe^on 
aiat satisfy a weU-understood infonnation need (known-item search) to 
browsmg to discover unexpected patterns within the coUection (Mairhionini, 

Exploring information coUections becomes increasingly difficult as the 
volume and diversity grows. A page of information is easy to explore but 
when the information representation becomes the size of a book, or library 
or even larger, it may be difficult to locate known items or to browse to 
an overview The strategies to focus and narrow are weU understood by 
hbranans and information-search specialists, and now these strategies are 
bemg implemented for widespread use. The computer is a powerful tool for 
searching, but traditional user interfaces have been a hurdle for novice users 
(complex commands. Boolean operators, unwieldy concepts) and an inade- 
quate tool for experts (difficulty in repeating searches aaSss multiple data- 
bases, weak methods for discovering where to narrow broad searches, poor 
mtegration with other tools) (Borgman, 1986). TWs chapter suggests novel 
possibihtxes for fixst-time or intermittent versus frequent computer users 
and also for task novices versus experts. Improvements on traditional text' 
Mid multimedm searching seem possible as a new generation of visualiza- 
tion strategies for query formulation and information presentation emerges 
^\ discovering how to use rapid and high-resolution color 
d^rJays to present large amounts of information in orderly and user-con- 
m ^ "^of,- ^T^^P*^ psychologists, statisticians, and graphic designers 
(Bertm, 1983; Cleveland, 1993; Tufte, 1983, 1990) offer vSiible guid^S 
about presenting static information, but tiie opporhinity for dynSnic dis- 
plays takes user-interface designers well beyond current wisdoni 

llie objects-actions interface (OAI) model (see Fig. 2.2) helps by separat- 

^ft^'^ l"""""?? ^ 5^°"" organization as a Wer^chy or a 

matiix?) from mterface concepts (is your hierarchy can best represented as 
an outime, nod^link diagram, or a ti-eemap?). The OAI model also separates 
lugh-level interface issues (are overview diagrams necessaiy for Naviga- 
tion?) from low-level interfece issues (wiU color or size coding be used to 
represent salary levels?). ^ 

First-time users of an infoimation-exploration system (whether they have 
H^^^i task lajowledge) are sti^ggling to understand what they see 
^ Iti'^rf-ri^^' u^J^^^ information needs. They would 

s^n^^H 1 ^^Ji^^ *° ^f"^ ^^^'P^^ languages or elaborate 

shape-codmg rules. Ihey need the low cognitive buidei^ of menu and 
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direct-manipulation designs and simple visual-coding rules. As users gain 
experience with the interface, they can request additional features by adjust- 
ing control panels. Knowledgeable and frequent users want a wide range of 
search tools with many options that allow them to compose, save, replay, 
and revise increasingly elaborate query plans. 

To fedlitate discussion, we need to define a few terms. Task objects ^ such as 
Leonardo's notebooks or sports-video segments from the Olympics, are repre- 
sented by interface objects in structured relational databases, textual document 
Kbraries, or multimedia document libraries. A structured relational database con-/ 
sists of relations and a schema to describe the relations. Relations have items (usur;|| 
ally called tuples or records), and each item has nniltiple attributes (often called|^ 
fields), which each have attribute values. In the relational model, items form aj^g 
unordered set (although one attribute can contain sequencing information 
be a unique key to identify or sort the otiier items) and attributes are atomic. 



A textual document library consists of a set of collections (typically up 
few hundred collections per library) plus some descriptive attributes about 
library (for example, name, location, owner). Each collection has a name pl^ 
some descriptive attributes about the collection (for example, locati|n|' 
media type, curator, donor, dates, geographic coverage), and a set of it^^ 
(typically 10 to 100,000 items per collectiorO. Items in a collection may v^ 
greatly, but usually a modaate-sized superset of attributes exists that c^^i " 
all the items. Attributes may be blank, have single values, have nmltipl^^ 
ues, or be lengthy texts. A collection is owned by a single library, and a%j|^ 
belongs to a single collection, although exceptions are possible. A multi^^t 
document library consists of collections of documents that can contain |^ 
sound, video, animations, and so on. . •'Hi.^ 

Task actions such as fact finding are decomposed into browsing or saiM^M 
and are represented by interface actions such as scrolling, zooming, joininp^ 
linking. Users begin by formulating their information needs in^ 
domairu Iksks can range from specific fact finding, where there is:^^ 
readily identifiable outcome, to more extended fact finding, with unfei 
but replicable outcomes. Relatively unstructured tasks include ope^i^ 
browsing of known collections and exploration of the availability ^^"'^'^ 
mation on a topic: i 



Specific fact finding (known-item search) 

Find the Library of Congress call ntonber of "Future Shock->^ 
Find the telephone iiumber of Bill Cliriton. -li^f 
Find the highest-resolution LANDSAT image of College 
on Dec. 13, 1997. 

Extended fact finding 

What other books are by the author of Jurassic Park? 
What genres of music is Sony publishing? 
Which satellites took images of the Persian Gulf War? 
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Open-ended brozosing 

Does the Mathew Brady Civil War photo coUection show the role of 
women in that war? 

Is Aere new work on voice recognition being reported from Japan? 
Is there a relationship between carbon-monoxide levels and 
desertification? 

Exploration of availability 

What genealogy information is avaUable at the National Archives? 

S^^IZ ? ^"^^^^^^ Grateful Dead band members? 

Do NASA datasets demonstrate acid-rain damage to soy crops? 

Ctoce users have clarified their information needs, the first step in satisfv 
mgthoseneedsistodeddewher^toseardiCMarchionn^^ 
sion of information needs, stated in task-domain tenniAo W o in^ce 
actions 1. a large cognitive step, but it must be accomplished^^ ex^ 
sionomese actions in a query language or via a serii of mo^:7e^Z 

^^PP^^J'^lfi^ding aids can help user^ to clarify and pursue their infor- 
mation needs. Examples include tables of contents or Indexes i^ b^S 
descnphve mtroductions, concordances, key-word-in-contex^^Q^te' 
and subject classifications. Careful understanding of previous aS[ ootlnS?; 
search requests, and of the task analysis, can Lp^^^^^ r ~bv 

mat^vTo Congressional Research Service has a list of approxi- 

terms m ite Legislative Indexmg Vocabulary. The National Library of Medi- 
s^tn^^SLt^^^^^ 

briS? t^^'^' "^^'^^^^ textual-document seaixiies 

S;s^din3"'' "r^^'f ^ multimedia-doc^ 
searches and mlxoduces a four-phase framework. The main contribution is a 

?le^r^r""'"f^^*^°'^ '"^'^^'^ ^--^ - da~es^d 
user tasks. The final section explores advanced filtering methods. 



15.2 Database Query and Phrase Search in 
Textual Documents 

rwnichtheSQLlanguagehasbecomeawidespreadstandaidfReisner 1988) 
I ^wrate queries that specify matches on atL^utevah.^^^^ 
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date of pubHcation, language, or publisher. Each document has values for the 
attributes, and database-management methods enable rapid retrieval even 
with milHons of documents. For example, an SQL-like command might be 

SELECT DOCOMENT* 
FROM JOUPNAL-DB 

WHERE (DATE >= 1994 AND DATE <= 1997) 
AND (LANGCIAGE = ENGLISH OR FRENCH) 
AND (POBLISHER = ASIS OR HFES OR ACM) . 

SQL has powerful features, but using it requires training (2 to 20 hours), and 
even then users make frequent errors for many classes of queries (Welty 
1985). Alternatives such as query-by-example can help usere to formulate sim- 
pler queries, such as requesting aU English-language ACM articles published 
durmg or after 1994: 

' I AUTHOR I LANGOAGE | POBLISHER 

P.^ j >=1994 j I ENSLISH~~f AOI 



The full set of Boolean expressions, however, is difficult to express except 
inside a special condition box. 

Form-fillin queries can substantially simplify many queries, and, if the user 
mterface permits, some Boolean combinations (usuaUy a conjimction of dis- 
juncts (ORs) within attributes with ANDs between atfaibutes) can be easy to 
express: ' 

JODRNAL DATABASE 
DOCUMENT*: 

DATE: 1994.. 1997 
AUTHOR: 
LANGUAGE: 3BN6LZSH, FRKNCH 
PUBLISHER: ASXS, HFES, ACM 

Although SQL is a standard, many form-fillin variants for expressing rela- 
tional database queries have been proposed to aid novice searchers. The 
diversity is itself an unpediment to easy use, but designers assume that useis ■ 
are willing to invest minutes or hours to learn each interface. This assump- 
tion is not vaUd for walk-up kiosks or for web pages offering textual-docu- ^ 
ment library searches, in which users are often invited to type keywords or ^ 
natural-language queries in a box, and to cUck on a run button. This presenta^S 
tion IS rneant to be appealing, but the compiiter's capacity for responding t(^l„ 
the natural-language query is often limited to eliminating frequent terms cSi 
commands ("please Ust the documents that deal with") and searching fat^ 
remaming words. A ranked list of documents is usually presented, and user^f ij 
must do their best in choosing relevant items from the list. 
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Plate Bl: A computer program with 4000 lines of code. The newest lines 
in red; the oldest are in blue. The smaUer browser window shows a code 
overview and detail view. (Used with pennission of ATT Bdl Labs, 
Naperville, IL.) 





Plate B2: Infonnation-retrieval themescape, showing a multidimensional information 
space pressed down into a two-dimensional topological map. Some clustering of points 
can be interesting, but they cany the danger of misinterpretations of the meaning of adia- 
cency. (Wise et al., 1995.) (Used with permission of Battelle Pacific Northwest National 
Laboratory, Richland, WA.) 
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Plate B3: Medical version of Lifelines. Physician visits, conditions, hospitalizations, 
and medications are shown. Every item in the record is seen as a line or icon in the 
overview, with color coding by doctor. Line thickness indicates severity and dosage. 
VS^dows on the right side show the details. (Plaisant et al., 1997.) 




Plate B4 (a): FilmFinder showing 1500 films in a starfield display; where the locaticwc^^ 
of each point is determined by the year of the film {x axis) and by the film's popula^j||^ 
ity in video store rentals (y axis). The color encodes ihe film t3^e. - ^^^^ 




|Blate B4 (b): FilmFinder after zooming in on recent popular films. When 
r|iss than 25 films remain, the titles appear automatically. 




I Babam, Mertin BacaU Lauren 

I Cfisset, Jean-Reri«i:>>,l Bergman, Ingrid • r y^^s,,^^, 
! Pertdns, Anthony ^ Bisset Jacquefimg^t?^ 
I Connery, Sean i^^, Hflter. Wendy 

^qud John.._ . . .^ML'^^ ^ v..-%^^K^SBi^^ 



_.lf?!^* FilmFinder after selection of a single film. The info card pops 
mth details on demand. (Ahlberg and Shneidennan, 1997.) 
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Plate B5: Spotfire veraion of FilmFinder, which provides increased user con- 
trols. Users can set axes (set to length in minutes and year) and glyph attrib- 
utes (color is set to subject, and larger size indicates award-winning film). 
(Used with permission of IVEE Development, Goteboig, Sweden.) 
(http:/ /www.ivee-com) 




Plate B6: Telephone network traffic represented by thickness and 
color of the half-line segments between cities. (Used with permission 
of ATT Bell Labs, Naperville, IL.) 



15^ Database Query and Phrase Search in Textual Documents 



515 



thi^^n^.'if^ easy-to-use interfece is a good idea, if users caimot express 
their intentions or are uncertain about the meaning of the results, then the 
mterface may need improvement. Finding a way to provide powerful search 
°^^fr^.^^S ^o^^e "sers is a current chaUenge. Existing inter- 
faces often hide miportant aspects of the search (by poor design or to protect 

fi^T/nT ^t'^^^r^^ "^^^ ^"^^ specSication so dif- 

K discourage use. Evidence from empirical stud- 

1^ shows that users perform better and have higher subjective satisfaction 
when they can view and control the search (Koenemann and Belkin 1996) 

An analogy to the evolution of automobile user interfaces might clarify 
tL^ u !^°°^P^«*°« offered a profusion of controls, and efch manS^ 
arturer had a distinct design. Some desi^is-such as having a brake^i 
that was far from the gas pedal-were dangerous. Furthermore, if you were 
accustomed to driving a car with the brake to the left of the gas pedal and 

fol ""t^^' l ""'^ *^ ^""^ ^"^^S"' ^Sht be risky to trfde Srs ft 
took a hdf-century to achieve good design and appropriate consistency in 
automobiles; lef s hope that we can make the transMon faster for tex^Z^ 
user interfaces. '^^^^^ 

f..^7Zl^'^ '^^'^ T"^ consistency across multiple systems can bring 
fi^il' "^t""^ assumptions, and increase success iS 

fmdmg relevant items. For example, with the variety of web search systems 
such as Lycos, Infoseek, and AltaVista, users might expect that the^s^' 
strmg direct manipulation would pix>duce one of the foUowing: 

• Search on the exact string direct manipulation 

• ProbabiUstic search for direct and manipulation 

• ProbabiHstic search for direct and manipulation, with some 
weightmg if the terms are in close proximity 

• Boolean search on direct AND manipulation 

• Boolean search on direct OR manipulation 

• SSutS^^^^^^ indicating missing AND/OR operator or other 

h, J^LT^'^K ^"^T"' ^ ""^^ ^° indication regardmg which interpre- 
tation has been chosen and whether stemming, case matching, stop words 
or other transfomaations are being applied. Often, the resultslre dLplayed 
m a relevance-ranked manner that is a mystery to many users (and some- 
tunes IS a proprietary secret). 

i. fjfifr'''^?^*! t^^'^ practice, we might use a four-phase framework to sat- 
isfy the needs of ftrst-time, intermittent, and frequent, users who are access- 
mg a variety of textual and multimedia Hbraries (Shneidennan et al 1997) 
Fmdmg common ground will be difficult; not finding it wiU be 'tragic* 
A early adopters of technology are willing to overcome difficulfL,' 
the middle and late adopters are not so tolerant. The future of search services 
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on the World Wide Web and elsewhere may depend on the degree to which 
user frustration and confusion are reduced, while the ability to find reliably 
sought items in the surging sea of information is increased. 

The four-phase framework (Box 15.1) gives great freedom to designers to 
offer features in an orderly and consistent manner. The phases are 

1- Formulation: expressing the search 

2. Initiation of action: launching the search 

3. Review of results: reading messages and outcomes 

4. Refinement: formulating the next step 

Formulation includes the source of the information, the fields for limiting % 
the source, the phrases, and the variants. Even if technically and economically % 
feasible, searching all libraries or collections in a library is not always the -^B 
preferred approach. Users often prefer to limit the sources to a specific 
library; collection in a library or siobcollection range of items (usere may 
choose date ranges, languages, media types, publishers, and so orO- Users S| 
may wish to limit their search to specific /ie2& (for example, the title, abstract, ||! 
or full text of a scientific article) of items within a collection. lypicaily, xisers 
searching on common phrases would prefer to retrieve only those docur^SS 
ments whose title contains those phrases. Sources may also be restricted by;^|^' 
structured fields (year of publication, volume nunabei^ and so on). 

In textual databases, users often seek items that contaiii meaningful phras^^ 
(Civil War, Environmental Protection Agency, Georg^f^ 
Washington, air pollution, carbon monoxide), and multiple entrj|^ 
windows should be provided to allow for multiple phrases. Searches onj^ 
phrases have proved to be more accurate than are searches on words. Since|^ 
some relevant items may be missed by a phrase approach, users should have|^ 
the option to expand a search by breaking the phrases into separate wordk^ 
Phrases also facilitate searching on names (for example, search on Georg^|^ 
Washington shoxild not turn up George Bush or Washington, D.C.). If BoolearigS 
operations, proximity restrictions, or other combining strategies are specifiabl^|^ 
then the users should be able to express them. Users or service providers shox4|^ | 
have control over stop lists (common words, single letters, obscenities). rtlM 

When users are \msxu-e of the exact value of the field (subject tenri; ofi^ 
spelling or capitalization of a city name), they may want to relax the sear^^ 
constradnts by allowing variants to be accepted. In structured databases,:j^^S 
variants may include a wider range on a numeric attribute. In a textual-ci^l^J 
ument search, interfaces should allow user control over variant capitaliz^^ 
tion (case sensitivity), stemmed versions (the keyword teach retrie^^^^ 
variant suffixes such as teacher, teaching, or teaches), partial rnatches 
keyword biology retrieves sodobiology and astrobiology), phonetic 
ants from soundex methods (the keyword Johnson retrieves Joiison/ Jan||^^g 
Johnsson), synonyms (the ke5rword cancer retrieves oncology), abbie^^^ 
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Box 15.1 




bons (the keyword IBM retrieves International Business Machines, and vice 
versa), and broader or narrower terms from a thesaurus (the keyphrase New 
England retrieves Vermont, Maine, Rhode Island, New Hampshire, Massa- 
chusetts, and Connecticut). 
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The second phase is the initiation of action, which may be explicit or 
impUdt. Most current systems have a search button for expUcit initiation, or 
for delayed or regularly scheduled initiation. The button label size, and 
color should be consistent across versions. An appealing alternative is 
implicit initiation, in which each change to a component of the formulation 
phase immediately produces a new set of search results. Dynamic queries— in 
which users adjust query widgets to produce continuous updates— have 
proved to be effective and satisfying. They require adequate screen space 
and rapid processing, but their advantages are great. 

The third phase is the review of results, in which the users read mes- 
sages, view textual lists, or manipulate visualizations. Users may be given 
control over what the size of the result set is, which fields are displayed, 
how results are sequenced (alphabetical, chronological, relevance 
ranked), and how results are clustered (by attribute value, by topics) 
(Pirolli et al., 1996). 

The fourth phase is the refinement. Search interfaces should provide mean- 
ingful messages to explain search outcomes and to support progressive 
refinement. For example, if a stop word, obscenity, or misspelling is elimi- 
nated from a search input window, or if stemmed terms, partial matches, or 
variant capitalizations are included, users should be able to see these 
changes to their query. If two words in a keyphrase are not found pioxi- 
mally, then feedback should be given about the occurrence of the words 
individually If multiple phrases are input, then items containing aU phrases 
shoTild be shown first and identified, foUowed by items containing subsets; i 
but if no documents are found with all phrases, that failure should be indir 
cated. There is a fairly elaborate decision tree (maybe 60 to 100 branches) q£ ^ 
search outcomes and messages that needs to be specified. Another aspect of vfe 
feedback is that, as searches are made, the system should keep track of thernH 
in a history buffer to allow review of earlier searches. Progressive refinemen^^li 
in which the results of a search are refined by changing of the search paraJf| 
meters, should be convenient Search results and the settings of all parame^^^ 
ters should be objects that can be saved, sent by electronic mail, or used^^?^ 
input to other programs— for example, for visualization or statistical toolsi|}^ 
The four-phase framework can be applied by designers to make titi^p 
search process more visible, comprehensible, and controllable by users. Th|^J 
approach is in harmony with movement toward direct maitipulat&n^^^^ 
which the state of the system is made visible and is placed under us#,c|S^ 
trol. Novices may not want to see all the components of the four pha^es'llffl 
tially, but, if they are unhappy with the search results, they should be akilj^ 
view the settings and change their queries easily. A revised interface ipf IH" 
Library of Congress' THOMAS system (Fig. 15.1) shows how the irame^:^ 
might be applied to full-text searching of proposed legislation. . « 



■. i 7}'- ^t^fl^g 
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Figure 15.1 



A revised interface for the Library of Congress' THOMAS system. The display 
shows how the four-phase framework might be applied to text searching on Con- 
gressional Record articles. Implemented by Bryan Savin at the Univereity of Mary- 
land Human-Computer Interaction Laboratory. (Shneiderman et aL, 1997.) 



15,3 Multimedia Document Searches 



Interfaces to search structured databases and textual-docvunent libraries are 
good and getting better, but searching in multimedia dociunent Ubraries is 
still in a primitive stage. Current approaches to locating images, videos, 
sound, or animation depend on a parallel database or doaiment search to 
locate the items. For example, searches in photo libraries can be done by 
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date, photographer, medium, location, or text in captions, but finding photos 
showing a ribbon-cutting ceremony or videos of a sunset is difficult In the 
near term, those people who must search multimedia documents should 
push for ambitious captioning and attribute recording. Classification accord- 
ing to useful search categories (agriculture, music, sports, personalities) is 
helpful, although cosfly and imperfect. 

Recent advances in computer algorithms may enable greater flexibility in 
locating items in multimedia libraries. User-interfece designs to specify the per- 
missible matches are varied. Some systems have elaborate textual commands, 
but most are moving toward graphical specification of query components: 

• Photo search Finding photos with images such as the Statue of Liberty 
is a substantial challenge for image-analysis researchers, who describe 
this task as query by image content (QBIC). Lady Liberty's distinctive 
profile might be identifiable if the orientation, lens focal length, and 
lighting were held constant, but the general problem is difficult in large 
and diverse collections of photos. Two promising approaches are to 
search for distinctive feattires such as the torch or the seven spikes in 
the crown, or to search for distinctive colors, such as the faded green 
copper verdigris. Users can specify features or color patterns with stan- 
dard drawing tools, and even can indicate where in the image to 
search. For example, users could specify red, white, and blue in the 
upper third of an image to look for an American flag flying above a 
building. Of course, separating out the British, French, or other flags is 
not easy. 

More success is attainable with restricted collections, such as of glass 
vases, for which users could draw a desired profile and retrieve vases 
with long narrow necks. Other candidate collections include photos of 
constellations, subatomic partide tracks, or red blood cells. Users could 
specify their requests by selecting from a set of templates and adjusting 
the templates to describe their query. For critical applications, such as 
fingerprint matching, current successes depend on himian identifica- 
tion of as many as 20 distinct features, but automatic recognition is 
improving. Even if completely automatic recognition is not possible, it 
will still be useful to have computers perform filtering, such as finding 
all the portraits with neutral backgroxmds in a photo library. 

• Map search Computer-generated maps are increasingly available 
online. Locating a map by latitude and longitude is the structured- : 
database solution, but search by features is becoming possible because g 
the tools used to build maps preserve the structural aspects and the 
mtiltiple layers in maps. For example, users might specify a search for ^ 
all port cities with a population greater than 1 million and an airport ;^ 
within 10 miles. Search on simpler maps such as airline routes migH || 
find flights to a given destination with no more than two connections^ 
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on the same airline. Another candidate is weather maps, in which 
structured data— such as temperature, winds, or barometric pressure- 
make the search specification convenient. 

• Design or diagram search Some computer-assisted design packages 
offer users Hmited search capabilities within a single design or across 
design collections. Finding red circles inside blue squares may help in 
some cases, but more elaborate strategies for finding engine designs 
with pistons smaUer than 6 centimeters could prove more beneficial. 
Diagramming tools for making flowcharts or organization charts can 
add search capabiHties to locate organizations that have more than five 
levels of management or sihiations where vice presidents are managing 
more than seven projects. Newspaper-layout packages could allow 
search for aU occasions of headlines using fonts larger than 48 points, or 
headlines that span the front page. 

• Sound search Imagine a music database system that would respond 
when users hum a few notes by producing a Hst of symphonies that con- 
tain that stidng of notes. Then, with a single touch, usere could listen to the 
full symphony Implementing this idea in the unsfanchaied world of ana- 
log-encoded or even digitaUy encoded music is difficult, but imagine that 
the score sheets of symphonies were stored with the music and that string 
search over the score sheets was possible. Then, the application becomes 
easier to conceive. Identification of the lasers' hummed input might not be 
reliable, but if visual feedback were provided or if usere entered the notes 
on a staff, then the fantasy would become feasible. Finding a spoken word 
or phrase in databases of telephone conversations is still difficult, but is 
becoming possible, even on a speaker-independent basis. 

• Video search Searching a video or fihn involves more than simply 
searching through each of the fi-ames. Users may wish to have a video 
segmented into scenes or cuts, and to identify zooming in or out and 
panning left or right. Gaining an overview of a 2-hour video by a time 
Une of scenes would enable better understanding, editing, or selection. 
Combinations of sfa^chired databases and textual documents with 
video libraries lead to powerful services. Television news or sports 
Ubraries maintain strurtured databases and textual documents to sup- 
port search for presidential appearances, disasters, or football high- 
lights, carefully indexed for rapid future retrieval 

Animation search Animation-authoring tools are still in early stages of 
development, but it might be possible to spediy searches for certain 
kinds of animation— for example, spinning globes, moving banners, 
bouncing balls, or morphing faces. Although it might be less useful, it 
should be relatively easy to search for sUdes in a presentation that have 
moving text that comes in from the left, or in which the b-ansition from 
one slide to another is by a barndoor animation. 
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15.4 Information Visualization 



Grasping the whole is a gigantic theme. Argiiably, intellectual history's most impor- 
tant. Ant-vision is humanity's usual fate; but seeing the whole is every thinking per- 
son's aspiration. 

David Gelemter, Mirror Worlds, 1992 

Visualization is a method of computing. It transforms ttie ssonbolic into the geomet- 
ric, enabling researchers to observe their simulations and computations, ^ualiza- 
tion offers a method for seeing the unseen. It enriches the process of scientific 
discovery and fosters profoimd and unexpected insights. In many fields it is already . 
revolutionizing the way scientists do science. : i 

McCormick et aL, 1987 3 



i 



The success of direct-manipulation interfaces is indicative of the power of 
using computers in a more visual or graphic manner. A picture is often said to 
be worth a thousand words and, for some tasks, a visual presentation — such 
as a map or photograph — is dramatically easier to use or comprehend than 
a textual description or a spoken report. As compxiter speeds and display re?^'|^ 
olution increase, information visualization and graphical interfaces are Hkdy^^l 
to have an expanding role. If a map of the United States is displayed, then 
should be possible to point rapidly at one of 1000 dties to get tourist inform^^ 
tion. Of course, a foreigner who knows a dtfs name (for example, NeW^Jp 
Orleans), but does not know its location, may do better with a scrolling alpha||^ 
betical list. Visual displays become even more attractive to provide orienta^^' 
tion or context, to enable selection of regions, and to provide dynain^^ 
feedback for ideiitifying changes (for example, on a weather map). Sdentifie | 
visualization has the power to make visible and comprehensible atoniiCy,,f^o|^ 
mic, cuid common three-dimensional phenomena (such as heat conductic«i^^|^ 
engines, airflow over wings, or ozone holes). Abstract-information visuafi^^| 
tion has tiie power to reveal patterns, clusters, gaps, or outliers in statfeti^ 
data, stock-market trades, computer directories, or docunient collection^^|^ 

Overall, the bandwidth of information presentation is potentially higtij^ 
the visual domain than it is for media reaching any of the other 
Hiunans have remarkable perceptual abilities tfiat are greatiy imderutil|z|c 
current designs. Users can scan, recognize, and recall images rapidly, a^^ 
detect subtie changes in size, coloi> shape, movement, or texture. 
point to a single pixel, even in a megapixel display, and can drag one q|?| 
another to perform an action. User interfaces thus far have been larg^? 
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areTm^^g^ "^"^ approaches are explored, appealing new opportunities 

There are many visual design guidelines. The central principle mieht be 
summarized as this visual-inforrmHon-seeking mantra: 

Overview first, zoom and filter, then details on demand 
Overview first, zoom and filter, then details on demand 
Overview first, zoom and filter, then details on demand 
Overview first, zoom and filter, then details on demand 
Overview first, zoom and filter, then details on demand 
Overview first, zoom and filter, then details on demand 
Overview first, zoom and filter, then details on demand 
Overview first, zoom and filter, then details on demand 
Overview first, zoom and filter, then details on demand 
Overview first, zoom and filter, then details on demand 
Overview first, zoom and filter, then details on demand 
Overview first, zoom and filter, then details on demand 

Each line represents one project in which I found myself rediscovering 
this pnnaple and therefore wrote it down as a reminder. Tlie mantra 
proved to be a good starting point when I was trying to characterize the 
multiple mfonnabon-visualization innovations occurring at university 
government, and industry research laboratories. To sort out the numeioii 
prototypes and to guide researchers to new opportunities. Box 15.2 gives a 
data type by task taxonomy (TTT) of information visualizations. 

As in the case of search, users are assumed to be viewing coUections of 
items, where items have multiple attiibutes. In aU seven data types (one- 
two-, Aree-dimensional data; temporal and multi-dimensional data; and 
tree and network data) the items have one or more atb*utes. A basic search 
tesk IS to select aU items that match tai^et attributes-for example, find aU 
divisions m an company that have a budget gi^ater than $500,000 
ah^ f ? "^^^^ of the TTT characterize the task-domain information 
objects and are organized by the problems that users are tiylng to solve 
For example, in two-dimensional information such as maps, users are tir- 
ing to grasp adjacency or to navigate paths, whereas in ti-ee-stiiichired 
information, users are faying to understand parent-child-sibling relation- 
ships The tasks in the TTT are task-domain information actions that users 
wish to perform. 

The seven tasks are at a high level of absbraction. Refinements and addi- 
tions to these tasks would be natiiral next steps in expanding this taxonomy. 
The sev«i tasks are overview, zoom, filter, details-on-demand, relate, historv' 
extiact. Further discussion of the seven tasks foUows the descriptions of die 
seven data types. 
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Box ISJl 



Data Type by Task Taxonomy flTT) to identify visualization data types and the 
tasks that need to be supported. 




1-D Linear Data 

Linear data types include textual documents, program source code, 
alphabeticed lists of names, all of which are all organized in a 
manner. Each item in the collection is a line of text containing a strifij^S: 
characters. Additional attributes might be the date of most recent up;|a#l 
or author name. Interface-design issues include what fonts, color, sizj^^ 
use, and what overview, scrolling, or selection methods can be usecK 
tasks might be to find the number of items, to see items having Geii| 
attributes (show only lines of a document that are section titles, lin|p|^| 
program that were changed from the previous version, or people 
who are older than 21 years), or to see an item with all its attributes:|f " 
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artide; the older and newer issue, of ft w "'f 

on the bookshelf with decreasing space A^Xr^^S 
^-^iorjal data showed the ^^l>?.^:T^.^:^,r:^Z^. 



_ t root 

irvwrwatrwK 1 root 
/u$r/sfw/Hb/11bnetm9t, 
i rvmnmrwK i root 
/usr/siWHb/llbnetBigt. 
-rw-r— r— i bin 
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— 'r — 



t root 
1 bin 
1 btn 
1 bin 
1 bin 
1 bin 
1 root 



-rw-r — r 



•r— 1 



r— 1 



1 root 
1 root 
1 root 
1 root 
t root 
root 



T— r— 1 



-rwcr-nr-oc 1 
-rw-r— «r— i 



*rw-r- 



•r— 1 



root 
2 bin 
1. root 



-f%»-r— r— 
drv«r-sr->f 
1 rwi(r>vxrw)i t root 
/usf/imports/harlqn 
-r-Kr-Kr-n 1 bin 
-r-xr-ur-n 

^nw — s- 



'fWKr-*r-K 
-WHr-Kr->r 
-n*xr-Kr-x 
^-Kr-xr-si 

I nwrwxrwx 



t root 
1 root 
1 root 
1 bin 



1 root 



tOBSO Feb 
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sa.1.0 

29 AU9 

50.1.0* 

352d Fdb 
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34902 feb 
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229376 I 
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8 1930 
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llbnls.a 
llbns.a 
11bfts1.& 
llbplKrect.a 
'Ilbp1icrect.6a.2.8 
jbp4xrect.so.2.8« 
jhp1»erBctj).«i 
Ibplot.a 
Jbplot264e.a 
1b9)et7221.a 
Ibpletftod.a 
.1bp1ot&9.a 
libplotdurfi.a 
llbplotolgl.a 
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llbplot.M-^ 
libposik.a 
llbresclv.a 



32S Jlf^unttfbl.sa. o. 50 
1990 HbSuntooKso.0.50- 
1990 11bsimtoo1j>.a 
1S90 llbsunwindow.a 
1990 11bsuniYlndDw.sa.0.50 
I?!S J!?suniFrindow.so.0,50» 
1390 libsunwlndow_P.a 
1990 IJbtenncapya 
1990 11btonncap«p.a 

1980 11btenn1lbji.a 

1990 libvtO.a 

1990 HbK.a 

1990 lint/ 
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16384 Feb 8 1990 IpfK* 

37« rSL II ^?US «*aKt<f 

fl J"? 2^ mcrtO^o 
15 Jul S 1980 m -> •./$hara/11b/i» 



Figure 15.2 

Each value bar shows one attribute of the linear Ii<5frfifite«,e t^*u- tt . ^. 

example, the two value bars on the rigS rep^^en^ Ae 

tion recency, or youth (Y). The cun^nly sScSd fflf J Sfe 5 ft! h 

IS moderately youthful. (Chimera, 1992 ) ^ ^"^^^^^ m size and 
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Figure 153 



Shakespeare's Hamlet, viewed with a one-dimensional overview on the right side 
showing where three xisers are reading the document Each person's field-of-view 
box shows their location. This user is at the start. (Used with permission of the Uni- 
versity of Calgary, Alberta, Canada.) 



data about sunspots using the information-mtural algorithms (Fig. 15.5) :^ 
(Jerding and Stasko, 1995). 

2-D Map data 

Planar or map data include geographic maps, floorplans, and newspaper :| 
laj^outs. Each item in the collection covers some part of the total area and © 
may or may not be rectangular: Each item has task-domain attribute^^ 
such as name, owner; and vahie, and interface-domain features, such as|| 
size, color, and opacity. Many systenis adopt a multiple-layer approach 1©^% 
dealing with map data, but each layer is two-dimensional. User tasks a^Hl 
to find adjacent items, containing items and paths between items, an4 jt^f 
perform the seven basic tasks. .: ||% 

Examples include geographic-information systems, which are^^ 
large research and commercial domain CLaurini-and Thompson, 1?9^^ 
Egenhofer and Plichards, 1993) with numerous systems available 
Hg. 6.5). Information-visualization researchers have used spatial 
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Figure 15.5 

S'o j^2^ antialiasing algorithm to 

Show 52,000 readings of sun spots from 1850 to 1993. The field-of-^ew box at the 
bottom shows the context for the detail view on top. aerding & St^ko 19^5 ) ^sed 
with permission of Georgia Tech University, Atlanta, GA) =»'^^0' ^^^S.) (Used 
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plays of document collections (Color Plate B2) (Korfhage, 1991; 
Hemmje et al., 1993; Wise et al., 1995) organized proximally by term 
co-occurrences. 

3-D World 

Real-world objects such as molecules, the htunan body and buildings have 
items with volume and with potentially complex relationships with other 
items. Computer-assisted design systems for architects, solid modelers, 
and mechanical engineers are built to handle complex three-dimensional 
relationships. Users' tasks deal with adjacency plus above-below and 
inside-outside relationships, as well as the seven basic tasks. In three- 
dimensional applications, users must cope with their position and orienta- 
tion when viewing the objects, plus must handle the serious problems of 
occlusion. Solutions are proposed in many prototypes with techniques such 
as overviews, landmarks, perspective, stereo display, transparency, and 
color coding. 

Examples of three-dimensional computer graphics and computer- 
assisted design are niunerous, but information-visualization work in three 
dimensions is still noveL Some virtual-environment researchers have 
sought to present information in ihree-dimensional stmctures (see Section 
6.8). Navigating high-resolution images of the himian body is the challenge 
in the National Library of Medicine's Visible Human project (Fig. 15.6) 
(North et al., 1996). Architectural walkthroughs or flythroughs can give 
users an idea of what a finished building will look like. A three-dimensional 
desktop is thought to be appealing to users, but disorientation, navigation, 
and hidden data problems remain (Fig. 15.7) (Card et aL, 1996). 

Temporal data 

lime lines are widely used and are sufficiently vital for medical records, 
project management, or historical presentations that researchers have ore- j g| 
ated a data type that is separate horn one-dimensional data. The distinc- - {^ 
tions of temporal data are that items have a start and finish time, and that v Qi 
items may overlap. Frequent tasks include finding aU events before, after, 
or during some time period or moment, plus .the seven basic tasks. ^r^p 
Many project-management tools exist; novel visualizations of timie:;^^ 
include the perspective wall (Fig. 15.8) (Robertson et al., 1993) and Lifie^^j 
Lines (see Fig. 1.5 and Color Plate B3) (Plaisant et al., 1996). LifeLin^^ 
shows a youth's history keyed to the needs of the Maryland Departrnac^i 
of Juvenile Justice, but is intended to present medical patient histories ai^g 
compact overview with selectable items that allow users to get details-o|^ 
demand. Temporal-data visualizations appear in systems for edi|ffl^p 
video data, composing music, or preparing animations, such as Magfoi^^ 
dia Director. /W 
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Figure 15.6 



^vci View von tne leJt; and an axial preview imaee of the unn^r ahHnr«;«,i • 



Multidimensional data 

Most relational- and statistical-database contents are conveniently manin 
^iS " ^^""^^--^ - which iten.s with . atS^'es'^b"^^ 
points xn a n-dtmensional space. The interface representation cS. be 

co'Ti^^dT'^Tr iTl^^^^ addition^Xn^ion 
controDed by a shder (Ahiberg and Shneidennan, 1994). Buttons can^eS 
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Figure 15.7 

WebBook and WebForagen These thiee^ensional worlds are used for browsing 
and lecordmg web pages. (Used vdth permission torn Xen^x^QP^^^fSI.) 



Srln^ fi "^^^ cardinality is small-^ay, less than 10. Tasks 

include finding patterns, clusters, correlations among pairs of variabfr 
gaps, and outUers. Multidimensional data can be xep^ented b^Xe^ 
dimensWscattergram, but disorientation (esped^y if th7usi's p^S 

dose pomts are represented as being larger) can be problems. ThTtedi 
^^Lc^^FT"" ""f^*^f '^''^ ^ ^ i-ovation t^t 

^Z:^7m ^"^^^ ^° comprehend 

Tlie early HomeFinder (Fig. 15.10) developed dynamic queries and 
^ZJ:"' u^^ontxoUed visualization of^ mXdimensW d'a 
Sn?^ "^^r^' ^"^^^^-^ FilmFinder (Color 

s^^fieM ^i? r ^ tedmiques (Ahlberg and Shneiderman, 1994) for ^ 
and l21 tf^^" omable, color-coded, user-controUed scattergrams)/ 

*e commercial product Spotfire (Color Pkte B5) 
CAhlberg and Wistrand, 1995). Extrapolations Ldude ^fAg^e^^^^ 
lampuiator (Goldstem and Roth, 1994), movable filters (Fishkin 
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Figure 15.8 



A perspective wall, showing time moving from left to right, with the focus in the 
center. Different categories of programs are shown on each level of the wall Color o: 
size coding can be used. (Used with permission from Xerox PARC, Palo Alto CA ) 



Stone, 1995), and Selective Dynamic Manipulation (Chuah et al., 1995) 
Related works include WisDB for multidimensional database visuaUzation 
(Keim and Kreigal, 1994), the spreadsheet-like Table Lens (Fig. 15.11) (Rao 
and Card, 1994), and the multiple linked histograms in the Influence 
Explorer (Tweedie et al., 1996). 

Tree data 

Hierarchies or tree structures are coUections of items, in which each 
item (except the root) has a link to one parent item. Items and the links 
between parent and child can have multiple attributes. The basic tasks 
can be applied to items and links, and tasks related to structural prop- 
erties become interesting— for example, how many levels are in the 
tree, or how many children does an item have? While it is possible to 
have similar itexns at leaves and internal nodes, it is also common to 
find different items at each level in a tree. Fixpd-level trees, with all 
leaves equidistant from the root, and fixed-fanout trees, with the same 
number of children for every parent, are easier to handle. High-fanout 
(broad) and small-fanout (deep) trees are important special cases. Inter- 
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Figure 15.9 



face representations of trees can use the outline style of indented labels 
used m tables of contents (Chimera and Shneiderman, ?99Ta nodt 
and^k c^agram; or a treemap. in which child items are r~i 
nested mside parent rectangles rectangles 

(Ee^ TJS," ^"^^y^^ """""I 

,1?^. ^' conneclmg lines, as in many comnuter-direc- 

^2t3^ .^fr K ""^""^^ ftree^ensional cone (Kg. 
I5.12J and cam trees (Robertson et aL, 1993; Caniere and Kazmai. 1995) 
dynamic pruning in the TreeBrowser (Fig. 15.13) (R^ J^T^ ^' 
the appealingly animated hyperboHc t4s (Hg. l^^^pj^^t^ 
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Figure 15.10 



asdm on the right, thel^^f^ ?' ^ ^ """ed the 



^idt^ 1^9?^ "^f^S^'P^^ (Shx^idenn^,, 1992; Johnson Z 
decision makinK (Asahi et al 1^^ (F'S- 15.16), sales data, business 

1995; r t 

accommodate to treemaps. mmutes to 



Network data 
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Figure 1S.U 

Table L^, a program that provided a spreadsheetlike world that also sun»nrf«^ 
pisyersAUsea with pemussionlrom Xerox PARC, Palo Alto, CA.) 



convem^t to consider them an as one data type. In addition to vettorM 
^ the baac tasks appUed to items and Iinks,l4twork ^orJ^^0 
know about shortest or least costly paths connecting tw^Sf or tevH 
the entire netwoA. Interfece representations indude a^"a^y^ 

t^^T^IT^ with the value of a link attriSa 

in me row and colunwi representing a link 

^on^^ of i^tionsh^ps and user tasks. Commercial packages!^ 
handle smaU networks or simple strategies, such as Netmfp's la^lg 
^^T/ ^ °^s^sing the central ax^^ped^ 

v^^i^^^can be designed to be more effective for a given Ll^ 
^^n^vc^diagram showmg heavy telephone tx^ffic onllolidaysAf 
^B6XAn ambUious three-dimensional approach aUowed usei» 
nUoaneh^and contool the visualization (Fairchild et aL, 1988« 
wS^wl topxc has been spawned by attexnpts to visuali;^ th^^ 
Wide Web (Andrews, 1995; Hendley et al., 1995) '-i^^ 

the^l^"*^*^ ^^^^ ^^^^d ^^fl^ ^ abstr^ 

the reahty There are many variations on these themes (two-andS 
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Figure 15.12 



fa shov™ at the top, mis Z^^^^^'^ When the rool of U,e l« 

fiom Xerox PARcf Palo AltoSj " ""^"^ wi* permission 



cession and leads ^S"dSS wTc^ t^^'i^^i^^'^ 



Overview task 



We can gain an overview of the entire rr.iior+i,,« 

tion 13.5) include zoomed-out X^oT^/.^^^^ ^Sec- 
to see the entii^ coUectw, nl,,?! I "^^'^ *5T>e that allow the user 
contains a -vabi b^x ^^^^ 

contents of the detail view ;,l7nv.H« 7 controls the 

factoi^i. Another popular appro^Tt^fi ? ^^1'° "^^^ ^OO" 
Which has been^ap^H^. ^t:^.^^'^^ ^^^^^ 
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Figure 15.13 



PDQ TreeBrowser, which supports pruning of nodes at every level of a tree. A user ... 

has pnmed an llOOWe tree of a satemte network, using dynamic queiy sliders at 
four levels; only nine possible ports (leaf nodes) remain in the result set (Kumar et ''''^ 
al., 1997) 



15.17) (Sarkar and Brown, 1994; Bartram et al., 1995; Schaffer et al., 1996^% 
The fishejre distortion magnifies one or more areas of the displays bu« J 
zoom factors in prototypes are limited to about 5. Although queiy-laifl 
guap fadUties inade it difficiUt to gain an overview of a coUectlon, infold 
mation-visualization interfaces support some overview strategy^^||| 
should do so. Adeqxiate overview strategies are a useful criterion to jua|f |j 
such interfaces. In addition, look for navigation tools to pan or sdloS^ 
through the collection. - -s-^^ 

Zoom task 

We can zoom in on items of interest. Useis typically have an intei^fi-^ 
some portion of a collection, and they need tools to enable them to con^ 
the zoom focus and the zoom factor Smooth zooming helps xisers 
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The Docmnatt Company, X^x 



Figure 15.14 

A hyperboUc tree browser that aUows 10 to 30 nodes near the center to be seen 
clearly; branches are reduced graduaUy as they get closer to the periphery. This dis- 
play technique guarantees that large trees can be accommodated in a fixed screen 
size. As the focus is shifted among nodes, the display updates smoothly, producine a 
satisfymg animation. Landmarks or other features can be introduced to reduce the 
djsonenfang effect of movement (Lamping, Rao et aL, 1995). (Used with permission 
of InXight Software, Palo Alto, CA.) 



serve their sense of position and context (Schaffer et al., 1996). A user can 
zoom on one dimension at a time by moving the zoombar controk or by 
adjustmg the size of the field-of-view box. A satisfying way to zoom in is 
to pomt to a location and to issue a zooming command, usually by hold- 
mg down a mouse button (Bederson and Hollan, 1993). Zooming in one 
dimension has proved useful in starfield displays (Jog and Shneiderman, 
1995). 
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Figuxe 15.15 

SS* Dewey Dedmal System shown as a treemap in which 

ri^"^^ the number of books held in each of the 1000 categories Color indi- 
cate frequency of utihzation, with darker indicating high utilization (hot) and 
hgh^er mdicataig low utilization. Implemented by Marko Teittinen at the University: 
of Maryland Human-Computer Literaction Laboratory. ; 



Filter task 

VVe can filter out uninteresting items. Dynamic queries appHed to M. 
Items m the coUection constitute one of the key ideas in information vis^ 
ahzation (Ahlberg et aL, 1992; Williamson and Shneiderman, 1992; Kunv!?; 
et al., 1997). When users control the contents of the display, they c^l 
quickly focus on their interests by eliminating unwanted items. SUdS' 
buttons, or other control widgets coupled to rapid Gess than 100 miM^I 
onds) display update is the goal, even when there are tens of thousands®! 
displayed itetns. ' 

Details-on-demand task 

Wfe can select an item or group to get details. Once a coUection ha^ H 
trimmed to a few dozen items, it should be easy to browse the ii"" 
about the group or individual items. The usual approach is to siiniji|L. 
on an item to get a pop-up window with values of each of the atfcMil 



m 
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Figure 15.16 



Winsurfer treemap that shows the 4900 files at several levels on a hard disk Area is 
set to be proportional to file size and color to £Qe type. Moving the cursor over an 
area produces an immediate display of attribute values on the bottom. Developed 
by Marko Teittinen at the University of Maryland Human-<;omputer Interaction 
Laboratory 



In Spotfibre (Color Plate B5), the details-on-demand vmidov/ can contain 
HTML text with links to further information. 

Relate task 

We can view relationships among items. In the FilmFinder details-on- 
demand window (Ahlberg and Shneiderman, 1994), users could select an 
attribute, such as the film's director, and cause the director alphaslider to 
be reset to the director's name, thereby displa)dng only films by that 
director- Similarly, in SDM (Chuah et al., 1995), users can select an item 
and then highlight items with similar attributes. In LifeLines (Color Plate 
B3) (Plaisant et al., 1996), users can click on a medication and see the 
related visit report, prescriptions, and laboratory test results- Designing 
user-interface actions to specify which relationship is to be manifested is 
still a challenge. The Influence Explorer (Tweedie et al., 1996) emphasizes 
exploration of relationships among attributes. The* Table Lens (Fig. 15.11) 
emphasizes finding correlations among pairs of numerical attributes (Rao 
and Card, 1994). 
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Figure 15.17 

A fisheye view or variable zooming on a hierarchical network diagram. These tech- 
niques can help to focus attention on details while preserving context, (a) A central 
node has been selected for zooming, (b) The node is expanded by a zoom factor of 3, 
exposing five nodes at the next leveL (SchajBfer et aL, 1996.) 



EQstoiy task 

We can keep a history of actions to support tmdO/ replay, and progressive 
refinement- It is rare that a single user action produces the desired out- 
come. Information exploration is inherently a process with many steps, so 
keeping the history of actions and allowing users to retrace their steps is 
important However, most prototjrpes fail to deal with tiiis requirement 
Maybe they are reflecting the current state of GUIs, but designers woiald • 
do better to model information-retrieval systems, which typically preserve 
the sequence of searches so that these searches can be combined or refine<t..J 

Extract task 

We can allow extraction of subcoUections and of the query parameteirS^M • 
Once users have obtained the item or set of items that iliey desire/Jit > 
would be useful for them to be able to extract that set and to save it to a fifei 
in a format that would facilitate other tises, such as sending by electi6nfi^ 
mail, printing, graphing, or insertion into a statistical or presenta$i|^ 
package. As an alternative to saving the set, they might want to save^?' 
send, or print the settings for the control widgets. Few prototj^es supj|^^,^ 
such actions, although Roth's recent work on Visage provides an 
capability to extract sets of items and simply drag-and-drop them infg^g 
next application window (Roth et aL, 1996). 

The attraction of visual displays, when compared to textual displ^^ 
that they make use of the remarkable human perceptual ability fpr^jil^ 
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coding. Highlighting techniques (for example, boldface text or hri<rh^Jr 
inverse video, blinking, underecoring, or boxing) c^ be r^ed L d^ T^' 
^on to certain items in a field of tholands ^&S Fol^^ :t^^^ 
play can aUow rapid selection, and feedback is apparent. nfe rve^rhlS" 
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Users have highly varied needs for filtering features The d^n^.^i. 
approach of adjusting numeric range sliders aSh^hd^ £0^^,^^ ^^^^^^^ 
gories, or buttons for smaU sets of fateeoS a^w!^ '''' "^r^ 

actions (the sliders or buttons) and objects (the qaerv result, in hZ 7 t 
doma^ display); the me of rapid, mcrJent^.^^^^^J'^^^- 
mmiedfate disptay of feedback Qess than 100 ^m^^ A^^^ 
fitsarefteprevention of syntax erroi. and an e^^urag^t^of^Sa^iT 

• Select a set of sUders from a large set of attributes. 

• Spedfygreaterthan,lessthan,orgreaterthanandlessthan. 

• Deal with Boolean combinations of slider settings 

• Cope with tens of thousands of points. 

• Permit weighting of criteria. 

The dynamic-query approach to the chemical table of elements was tested 
m an empirical comparison with a form-fillin query interfacT'^^?.^^^^^^^ 
• I^" counterbalanced-ordering within-subjects design i^ i?!!^' 
jstnr students showed strong advantages for the'dynSuSS 
of faster performance and lower em>r rates (Ahlberg e^ l99iT 

Commercial infoxmation-r^trieval systems, such as DIALOG or Fir.. 
Search, permit complex Boolean expressions with par^l^^^ but ^h^^ 
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widespread adoption has been inhibited by their difficulty of use. Numerous 
proposals have been put forward to reduce the burden of specifying complex 
Boolean expressions (Reisner, 1988), Part of the confusion stems from infor- 
mal English usage, in which a query such as "List all employees who live in 
New York and Boston" usually would result in an empty list because the 
"and" would be interpreted as an intersection; only employees who live in 
both cities would qualify! In English, ''and" usually expands the options; in 
Boolean expressions, AND is used to narrow a set to the intersection of two 
others. Similarly, in the English 'Td like Rxxssian or Italian salad dressing/' 
the "or" is exclusive, indicating that you want one or the other but not both; 
in Boolean expressions, an OR is inclusive, and is used to expand a set 

The desire for full Boolean expressions, including nested parentheses and 
NOT operators, has led to novel metaphors for query specification. Venn 
diagrams (Michard, 1982) and decision tables (Greene et aL, 1990) have been 
used, but these representations become clumsy as query complexity increases. 
To support arbitrarily complex Boolean expressions with a graphical specifica- 
tion, we applied the metaphor of water flowing from left to right through a 
series of filters, where each filter lets dirough only the appropriate documents, 
and the flow pattis indicate AND or OR (Yoxing and Shneiderxnan, 1993). 

In this filter-flow model, ANDs are shovm as a linear sequence of filters, 
suggesting the successive application of required criteria. As the flow passes 
through each filter, it is reduced, and the visual feedback shows a narrower 
stream of water. In Fig. 15.18(a) a journal database containing 6741 articles 
passes through the Date filter, about one-half of the articles satisfy the Date _ 
requirements of 94 to 97 (years 1994 to 1997). Only about one qxiarter of those; ' 
articles pass throtxgh the Language filteiv which selects English OR French. = 
Users can also specify ORs across attribxites, by putting filters in paraUel 
flow paths (Hg. 15.18b). When the parallel flow paths converge, the width )^ 
reflects the size of the union of the document sets. ' 

Negation is handled by a NOT operator that, when selected, inverts aijte|f 
currentfy selected items in a filter OFig. 15.18b). In the example, NOT 91^^ 
allows about 80 percent of the articles to pass the Date filter. Clusters of fil^ 
ters and flow paths (wititi one ingoing and one outgoing flow) can be rttiidjEj^ 
into a single labeled filter Creation of clusters ensures that the full query cam 
be shown on the display at once, and allows named clusters to be saved in^^ 



m 



library for later reuse. ^^^1^ 
The filter-flow approach has been shown to help novices and intermittMl"" 
users to specify complex Boolean expressions and to learn Boolean conc^^ 
A usability study was conducted with 20 subjects who had had little ek0^ 
ence using Boolean algebra. The prototj^e filter-flow interface was prei 
over textual interface by aU 20 subjects, and statistically sigiiificant ^.d^ 
tages emerged on comprehension and composition tasks. 

Another form of filtering is to apply a user-constructed set of keywpj|^ 
dynamically generated information, such as incoming electronic-nciail 
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90 
91 
92 
93 



94^^ 



German 
Russian 
Spanish 



English OR French 



► Result set 
868 
articles 



(a) 



?f:^J.bijrrnaKV.M^:^ 
















NOT 91 




ACMORASIS OR HFES 



AAAS 



APA 



German 
Russian 

Spanish 




Result set 

1332 
articles 



Engfish OR French 

Figure 15.18 

(a) Fater-flow model for the query (Date between 94 to 97) AND (Language is 
English OR French) . ^i^anguage is 

OR^t^^^ (DateNOT91) AND (Publisher is ACMORASIS 

UK HFES) OR (Language is English OR French) ). 



?of ^' i!f ^«P^P«- stones, or scientific journal articles (Belkin and Croft, 
1992). The users create and store their profiles, which are evaluated each 
time that a new document appears. Users can be notified by electa-onic mail 
tttat a relevant document has appeared, or the results can be simply 
collected mto a file until the users seek them out. These approaches are a 
modem version of traditional information-rettieval sta-ategy caUed selecHve 
dissemination of information (SDI), which was used in the earliest days of 
magnetic-tape distiibution of document coUections. Elaborate strategies for 
usmg the user-suppUed set of keywords include latent semantic indexing 
use of thesauri for find narrower or broader terms, and nahiral-languaee 
parsing techniques (Foltz and Dumais, 1992). Use of tiiese sti-ategi^ a^d 
term-fiiequency data can produce relevance rankings of retiieved documents 
tiiat are appealing to many users and are successftal in increasing the recall 
and precision of searches. A series of text-retrieval conferences (TREC) 
organized by Donna Harman at the National Institiite for Standards and 
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Technology (http://potomac.ncsLnistgov/TREC) has allowed developers 
of research and conunercial products to conapare their strategies against a 
large test collection of textual documents. 

A social form of filtering is collaborative filtering, in which groups of users 
combine their evaluations to help one another find interesting items in a large 
document collection (Resnick et aL, 1994). Each user rates docxnnents in terms of 
their interest Th«i, the system can suggest unread articles that are dose to the 
user's interests, as detennined by matches with other people's intei^ts. This 
metiiod can also be applied to movies, music, restaurants, and so on. For exam- 
ple, if you rate six restaurants high, the algorithms will provide you with other 
restaurants that were rated high by people who liked your six restaurants. This 
strategy has an inherent appeal, and dozens of systems have been built for orga- 
nizational databases, news files, music groups, and World Wide Web pages. 



15*6 Practitioner's Summaiy 

Improved user interfaces to traditional database-query and text- or multime- 
dia-document search will spawn appealing new products. Flexible queries 
against complex text, sound, graphics, image, and video databases are 
emerging. Novel graphical and direct-martipulation approaches to query 
formulation and information visualization are now possible. Whereas 
research prototypes have typically dealt v^th only one data type (one-, two-, 
and three-dimensional data; temporal and multidimensional data; and tree : ^8 
and network data), successful commercial products will have to accorruno- ''0^ 
date several. These products will need to provide smooth integration with - 'fM 
existing software and to support the full task list: overview; zoom, filter; 
details-on-demand, relate, history and extract. These methods are attractive 4g 
because they present information rapidly and allow user-controlled eKplor-iii^ 
ration. If they are all to be fully effective, we will require advanced datai||f^ 
structtu^s, high-resolution color displays, fast data retrieval, and rioyei; >3 
forms of user training. Many user interfaces for specifying advanced filteir^y^^ 
ing are being built and are worthy of evaluation for comnnercial projects. 



15.7 Researcher's Agenda 



Although the computer contributes to the information explosion, it is p^?|^l 
tially the magic lens for finding, sorting, filtering, and presenting the relev^ 
items. Search in complex structured documents, graphics, images, soluad/p 
video presents grand opportxtnities for the design of advanced user inte^^ 
and powerful search engines to find the needles in the haystacks ari^^ 
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S^^^eri^^^^ ^ools-^uch as 

perspective w^-^^b^^atw of Z"^' ''"T'' T'^'^' 

and validated by u^int^rface^e^e^^^^^ ^ """" 

tual psychology (underT^S^^^p.^^'- "^^^S^^^o^ percep- 

adentify4tas J^fp^^^^^^^^ ^ecision taking 

as are theoretical foundations and nJ^oTi, u ? ^^^^"^^^ is needed, 

the diverse emei^r^uS^,Tn„f f choosing among 

to sort out H^ s^^^o^^T': -^c^es wo^d hel^ 

Finany software tooIkL fS^^S^e^^.r^^'^v^ ^^^P^ 
tate the exploration pxoce^ ^ ««ovative visualizations would faciU- 



I World wwe Well Resources 

The search services such as Alta Vista, Excite, Infoseek 
provide remarkable but flawed access ' 
information retrieval topics such 



World Wide Web 
collaborative 



and Lycos 



Other 



http://www.aw.com/DTUI 
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